Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations

نویسندگان

  • U. Kang
  • Charalampos E. Tsourakakis
  • Ana Paula Appel
  • Christos Faloutsos
  • Jure Leskovec
چکیده

Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the diameter of massive graphs, that runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multi-modal/bi-modal shape of the Radius Plot, and its palindrome motion over time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Billion-Scale Graphs: Patterns and Algorithms

Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algori...

متن کامل

Research Statement - Tera-Scale Graph Analysis

My vision is to design and implement big data analytics system which finds useful patterns and anomalies in graphs. Graphs are ubiquitous: computer networks, social networks, mobile call networks, protein regulation networks, and the World Wide Web, to name a few. The large volume of available data, the low cost of storage and the stunning success of online social networks and Web2.0 applicatio...

متن کامل

Mining Tera-Scale Graphs: Theory, Engineering and Discoveries

How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Teraor Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous beh...

متن کامل

High Performance Computing and I/O Architectures for Database and Knowledge Discovery: The System Design Perspective

Research in parallel database (DB) and data mining (DM) algorithms has experienced a significant growth due to advancements in high performance computing (HPC) systems. Enabling technologies such as multi-core processors, object-based storage and high-bandwidth interconnects helped propel innovations to address fast increasing demands in scientific and commercial computing. Large-scale applicat...

متن کامل

Classification Rules and Genetic Algorithm in Data Mining

Databases today are ranging in size into the Tera Bytes. It is an information extraction activity whose goal is to discover hidden facts contained in databases. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Major Data Mining Tasks and processes include Classification, Clustering, Associations, Vi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010